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Abstract 

A large number of explicit estimators are proposed in this paper for loss rate estimation in a network of the tree topology. 
All of the estimators are proved to be unbiased and consistent instead of asymptotic unbiased as that obtained in [ l| for a specific 
estimator. In addition, a set of formulae are derived for the variances of various maximum likelihood estimators that unveil the 
connection between the path of interest and the subtrees connecting the path to observers. Using the formulae, we are able to not 
only rank the estimators proposed so far, including those proposed in this paper, but also identify the errors made in previous 
works. More importantly, using the formulae we can easily identify the most efficient explicit estimator from a pool that makes 
model selection feasible in loss tomography. 


Index Terms 
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I. Introduction 

Loss tomography has been studied for a number of years and a large number of estimators have been proposed for the 


networks of the tree topology [ 1 


10(1 . Among the proposed estimators, almost all of them rely on an iterative procedure, such as 


the expectation and maximization (EM) or the Newton-Raphson algorithm, to approximate the solution of a likelihood equation 
that can be a high degree polynomial. Using approximation to solve a high degree polynomial has been widely criticised for 
its computational complexity that increases with the number of descendants attached to the link or path to be estimated |5]. 
Because of this, there has been a persistent interest in the research community to find an explicit estimator that performs 
as good as the iterative approach. Apart from explicit estimators, there are other issues in loss tomography that have not be 
solved. One of the issues is the theoretical variance of the estimates obtained by a maximum likelihood estimator (MLE). As 
far as we are aware of, there has been no a creditable result reported in a general form for the variances of MLEs although 
some expressions were presented and used in literature, e.g.|l], that were obtained under specific conditions or assumptions. 
Because of this, the expressions cannot be used to evaluate the performance of an estimator. This paper is devoted to address 
these two issues and provide positive answers to them. 

There have been a number of attempts to propose explicit estimators and all of them aim at estimating the pass rate of a path, 
not a link, connecting the root of a network in the tree topology to an internal node of the network since there is a bijection 
between the pass rates of the paths and the pass rate of the links in a network of the tree topology. However, due to the lack 
of a clear strategy in the search of explicit estimators, all of the attempts are preliminary and produce little theoretical result. 
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Apart from this, some of the results reported from the previous works, including those presented in |1, 8, 11], are incorrect or 
incomplete because of the lack of understanding the nature of the estimation or the use of unrealistical assumptions. 

To complete the analyses and correct those mistakes stated above, we have undertaken a thorough and systematic investigation 
of the estimators proposed for loss tomography that aims at identifying the statistical principle and strategies that have been used 
or can be used in the tree topology. As a result, a number of findings are unveiled that show all of the estimators proposed 
previously rely on observed correlations to infer the pass rates. The most popular strategy is to use all of the correlations 
available in estimation, such as the maximum likelihood estimator (MLE) proposed in 2], that directly results in the use of 
high degree polynomials as the likelihood equations. Nevertheless, the qualities of the correlations, measured by the fitness 
between a correlation and the corresponding observation, are different, some are more fit than others. Rather than using all 
of the correlations available but using a small portion of the high-quality correlations, we can have an explicit estimator 
that performs at least as good as the MLE proposed in 2]. This paper is devoted to those findings that contributes to loss 
tomography in four-fold. 


A large number of explicit estimators are proposed on the basis of composite likelihood [12] that select some correlations 


from the available ones to estimate loss rates. The estimators are divided into three groups: the block wised estimators 
(BWE), the reduce scaled estimators (RSE), and the individual based estimators (IBE). 

• The estimators in the three groups are proved to be unbiased rather than asymptotic unbiased as that proved in |lj]. A set 
of formulae are derived for the variances of the estimators in RSE and IBE, plus the MLE proposed in 2]. The formulae 
show the variance of a loss rate estimator can be exactly expressed by the pass rate of the path of interest and the pass 
rate of the subtrees connecting the path of interest to the observers of interest. The formulae also show the weakness of 
the result obtained in 111] by the delta method. 

• The efficiency of the estimators in IBE are compared with each other on the basis of the Fisher information that shows an 
estimator considering a correlation involvinga few observers is more efficient than that considering a correlation involving 
many. Therefore, the estimator proposed in [1]] is the least efficient one. A similar conclusion is obtained for the estimators 
in BWE. 

• Using the formulae, we able to identify an efficient estimator by examining the end-to-end pass rates that makes model 
selection not only possible but also feasible. A number of simulations are conducted to verify this finding. 

The rest of the paper is organised as follows. In Section [III we briefly introduce the previous works about explicit loss rate 
estimators and point out the weakness of them. In Section [Till the loss model, the notations, and the statistics used in this paper 
are presented. In Section [IV] we derive the MLE considering all available correlations for the networks of the tree topology. We 
then decompose the MLE into a number of components according to correlations and derive a number of likelihood equations 
for the components in Section [V] A statistic analysis of the proposed estimators is presented in Section [VT] that details the 
statistical properties of the proposed estimators, one of them is the formulae to calculate the variances of various estimators. 
Given the large number of estimators, model selection is introduced in Section IVTIl A strategy based on the formulae proposed 
is presented and a number of simulations are conducted that verify the feasibility of the proposed strategy. Section IVIIII is 
devoted to concluding remark. 
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II. Related Works 


1311 into practice, where 


Multicast Inference of Network Characters (MINC) is the pioneer of using the ideas proposed in 
a Bernoulli model is used to model the loss behaviors of a link. Using this model, the authors of j2|] derive an estimator to 
estimate the pass rate of a path connecting the source to a node. The estimator is expressed in a polynomial that is one degree 
less than the number of descendants of the node [2-4]. To ease the concern of using numeric method to solve a higher degree 
polynomial (> 5), the authors of [I] propose an explicit estimator and claim the estimator has the same asymptotic variance 
as that obtained by the estimator proposed in |2[] to first order. However, the claim is questionable because there has been no 
result about the variance of an estimator, including the MLE proposed in [2], and the result is obtained in [1] is based on a 
unrealistical assumption, i.e. the loss rate of a link is very small. Under the assumption, almost all of the estimators proposed 
so far can achieve the same or better performance than that proposed in [1]. In addition, the variance of the MLE used in the 
comparison in [1] is also unrealistical because such a variance can only be obtained either by direct measurement or by letting 
the pass rate of the subtree rooted at the end of the path being estimated equal to or approach to 1. 

In contrast to [1], [8, 1 lj] propose an estimator that converts a general tree into a binary one and subsequently makes the 
likelihood equation into a quadratic equation of Ak that is solvable analytically. Experiments show the estimator preforms 
better than that in [1] since the estimator uses more information in estimation. However, except experimental results, there is 
little theoretical analysis to demonstrate why it is better than that proposed in [1]. In addition, although the author of 
proves the estimator is a MLE, it is not clear that the MLE proposed in [111 is the same as that proposed in [2]. 


[ 11 ] 


III. Assumption, Notation and Sufficient Statistics 
To make the following discussion simple and rigorous, we need to use a large number of symbols that may overwhelm the 
readers who are not familiar with loss tomography. To assist them, the symbols will be gradually introduced through the paper, 
where the frequently used symbols will be introduced in the following two sections and the others will be brought up later 
until needed. In addition, the most frequently used symbols and their meanings are presented in Table [T] for quick reference. 


A. Assumption 

To make loss tomography possible, probing packets, called probes, are multicasted from a source or a number of sources 
located on one side of a network to a number of receivers located on the other side of the network, where the paths connecting 
the sources to the receivers, via some routers, cover the links of interest. Statistical inference relies on the network topology 
and the correlation observed by the receivers to estimate the pass rate of the path shared by the paths from the source to 
the receivers. If a network that does not support multicast, unicast-based multicast can be used to achieve the same effect as 
multicast [14], fl. If the probes sent from sources to receivers are far apart and network traffic remains statistically stable 
during the measurement, the observations are considered to be independent identical distributed ( i.i.d .). In addition to probing, 
the losses occurred on a link or between links are assumed to be i.i.d as well. 


B. Notation 

Let T = (V, E) be the multicast tree used to dispatch probes from a source to a number of receivers, where V = 
{uo, Vi, ...Vm} is a set of nodes and E = {ei,..., e m j is a set of directed links that connect the nodes in V. By default vg is 
































4 


the root node of the multicast tree to which the source is attached. The set of leaf nodes R, R C V represents all receivers 
attached to T. If /(*) is used to denote the parent of node i, there is a correspondence between nodes and links, where e, is 
the link connecting Vf^ to v t . For instance, e\ is the link connecting the parent of Vi, i.e. vo, to v\. 

A multicast tree can be decomposed into a number of multicast subtrees at each of the internal nodes, where T(i) denotes 
the subtree that has e t as its root link and R(i) denotes the receivers attached to T(i). In addition, we use d, to denote 
the descendants attached to node i that is a nonempty set if i (f R. If x is a set, \x\ denotes the number of elements in 
x and |dj| denotes the number of descendants in di. For example. Figure |T| shows a complete binary multicast tree, where 
R = {v s ,v 9 , --.Uis}, R(2) = {v 8 ,V 9 ,v 10 ,Vu}, d 2 = {4,5}, and \d 2 \ = 2. 

If n probes are sent from vq to R in an experiment, each of them gives rise of an independent realisation of the passing 
(loss) process X. Let X^\i = 1, donate the i — th process, where x\ = l,k £ V if probe i reaches v k ', otherwise 
x\ = 0. The sample Y = comprises the observations of an experiment that can be divided into a number of 

sections according to R{k ), where Yk denotes the part of Y obtained by R{k). If we use y} to denote the observation of 
receiver j for probe i, we have y* = 1 if probe i is observed by receiver j; otherwise, yt = 0 . 

Instead of using the loss rate of a link as the parameter to be estimated, the pass rate of the path connecting Vo to 
i>k,k £ {1, ■ •, in } is used as the parameter and denoted by A / i; . The empirical value of the parameter is equal to the number 
of probes arrived at node k divided by the number of probes sent from the source, i.e. n. Given Ak- k £ V \ vo, we are able 
to compute the pass rates of all links in E. If ak denotes the pass rate of link k we have 

A k 


ak = 


A 


/(fc) 


( 1 ) 


Given ak, ak = 1 — ak is the loss rate of link k. 


C. Statistics 

To estimate Ak from end-to-end measurement, we need a likelihood function to connect the z./.t/.model defined previously 
to the observation obtained in an experiment. The MLE proposed previously considers all of the probes observed by R(k) that 
can be expressed as: 

n 

n k (d k ) = '^2 \f y l j, k £ ( 1 , m} ( 2 ) 

i =1 j€R(k) 

that is the number of probes reaching node k from the observation of R(k), called the confirmed arrivals at node k. To write 
a likelihood of Ak for rik (ylk), Pk and 7 k are introduced to denote the pass rate of the subtrees rooted at node k and the pass 
rate of the special multicast tree that connects vq to R(k), via node k, respectively. Clearly 7 fe = Ak • Pk,k £ {1, to} and 


n k (dk) 


iU) 


7 k = --—- that is the empirical value of 7 k. Note that 7 , = —-, j £ R is the empirical pass rate of the path from the 

n n ... 

root to node j. 

Given the assumptions made at the beginning of this section and the above definitions, the likelihood function of Ak for 
the observation obtained by R(k), i.e. Uk{dk) can be created as follows: 

P(A k \Y) = (A k Pk) nk{dk \l - A k Pk) n ~ nk{dk) ■ ( 3 ) 

Given ©. we can prove n k {dk) is a sufficient statistic with respect to ( wrt .) the passing process of A k for the observation 
obtained by R(k). Rather than using the well known factorisation theorem in the proof, we directly use the mathematic 
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Fig. 1. A Multicast Tree 


definition of a sufficient statistic (See definition 7.18 in [15]]) to achieve this. The definition wrt. the statistical model defined 
for the passing process is presented as a theorem here: 

Theorem 1: Let Y = {V fl ^, ..... X <ri> } be a random sample, governed by the probability function pA k {Y). The statistic 
n k {d k ) is minimal sufficient for A k in respect of the observation of R(k). 

Proof: According to the definition of sufficiency, we need to prove 

PA k {Y) 


PA k (Y\n k (d k ) = t) = - 

PA k {n k (d k ) =t,Y) 

is independent of A k . 

Given (|3}, the observation of R(k) with n k (d k ) = t is a binomial distribution as follows 

PA k {n k {d k ) =t)= (^(AkPkYil - A k / 3 fe ) n_t . 

Then, we have 


(4) 


PA k {Y\n k (d k ) = t) = 

1 

~w 


{A k p k y(l-A k l3 k ) n - 

{”)(A k f3 k y(i-A k p k y 


(5) 


which is independent of A k . Then, n k (d k ) is a sufficient statistic. 

Apart from the sufficiency, n k (d k ), as defined in Q. is a count of the probes reaching R(k) that counts each probe once 
and once only regardless of how many receivers observe the probe. Therefore, n k (d k ) is a minimal sufficient statistic in regard 
to the observation of R(k). ■ 


IV. Problem Formulation and Analysis 

A. Maximum Likelihood Estimator 

Turning the likelihood function presented in ((3j into a log-likelihood function, we have 


L{A k |fi) = n k (d k ) log (A k (3 k ) + (n- n k {d k )) log(l - A k f3 k ). 


(6) 
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TABLE I 

Frequently used symbols and description 


Symbol 

Desciption 

T(k) 

the subtree rooted at link k. 

dk 

the descendants attached to node k. 

R(k) 

the receivers attached to T(k). 

Ak 

the pass rate of the path from vo to . 

3k 

the pass rate of the subtree rooted at node k. 

ik 

A-k * At, pass rate from uo to R(k). 

X k 

the state of v & for probe i. 

St 

the cr-algebra created from d^. 

n 

the number of probes sent in an experiment, 

n k{dk') 

the number of probes reaches R(k). 

n k (x) 

the number of probes reaches the receivers attached to T(j),j G x. 

40*0 

the number of probes observed by the members of x. 


Differentiating © wrt. A and letting the derivatives be 0, we have 


rikidk) (n - rik(dk))0k 


= 0. 


(7) 


Ak 1 Ak3k 

Given the i.i.d. model assumed previously and the multicast used in probing, we have the following equation to link the 
observation of R(k) to 3k that is defined as the pass rate of the subtree rooted at node k 


jed k 

Solving 3k from © and using it in ©, we have a likelihood equation as 

n k (d k ) _ tt n 7 j 


1 - 


n ■ Ak 


lid 


j£d k 


Ak' 


( 8 ) 


(9) 


rik(dk) 

Using to replace - since the latter is the empirical value of the former, we have a likelihood as follows: 


jed k 


GO) 


( ITOt is identical to the estimator proposed in [2], In order to find the correlations considered in the MLE, we use © rather 
than ([TO} in the rest of this section because it explicitly connects observations to correlations. 


B. Correlations and Observation 

To find the number of correlations considered by the MLE, both sides, the right hand side (RHS) and the left hand side 
(LHS), of © are expanded to show the correspondences between observations and correlations. The correlations are called 
the predictors of the observations since the former predicates the latter. For instance, 7 j • 7 j/Ak,i,j £ dk A * ^ j is the 
predictor of the portion of probes that are simultaneously observed by at least two receivers attached to subtree i and subtree 
j, respectively. 
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To find the correlations involved in (|9]», a er-algebra, Sk, is created over dk- Let Efc = St \ 0 he the non-empty sets in 
Sk, each of Efc corresponds to a pair of predictor and observation. If the number of elements in a member of E/- is defined 
as the degree of the correlation, Efc can be divided into |<4| exclusive groups for correlations varying from 1 to \dk\- Let 
Sk(i),i £ { 1 , |c4|} denote the group that considers the correlation involving i members of dk, we call it i —wise correlation. 

For example, if dk = {i, j, k, l}, 5^(2) = ( i, k), [i, l ), (j, k), (j, 1), (k, ()} that consists of all of the pairwise correlations 

in dk and 5^(3) = {(i, j, k), ( i,j, l), (i , k, l), (j, k , ()} consists of the tripletwise correlations. 

Given Efc, rifc(dfc) can be decomposed into the probes that are observed by the members of E;.. If a; is a member of E/ ; . 
and |x| > 1, a probe that is observed by x is defined as if and only if at least a receiver attached to subtree j. j £ x observes 
the probe, called simultaneous observation. To explicitly express rik{dk) by n,j(dj),j £ dk, Ik{x),x £ E& is introduced to 
return the number of probes observed by x in an experiment. If u' is the observation of R(j) for probe i, which is equal to 


u ) = V 

keR{j) 

we have 

n 

Ik{x) = Y2 A u )' x G 

2=1 j(zX 

If x = (j), 


I k {x) = rij(dj),j £ d k , 


Then, we have 


I <41 

tife(dfe) = ^(- 1) I_1 ^2 h{x). ( 12 ) 

i= i xes k (i) 

(fl 2 l > states that nk (dk) is equal to a series of alternating adding and subtracting operations that ensure each probe observed 
by R(k) is counted once and once only in rik(dk)- 


C. Correlations in MLE 

Given (fl2l >. we are able to prove the MLE proposed in 2] considers all of the correlations in Efc. 

Theorem 2: 1) (0 is a full likelihood estimator that considers all of the correlations in Efc; 

2 ) 0 consists of observed values and their predictors, one for a member of Efc; and 

3) the estimate obtained from 0 is a fit that minimises an alternating differences between observed values and corresponding 
predictors. 

Proof: 0 is a full likelihood estimator that considers all of the correlations in dk- To prove 2) and 3), we expand the 
both sides of 0 to pair the observed values with the predictors of them according to Sk- We take three steps to achieve the 
goal. 

1) If we use (IT2l i to replace rik(dk) from LHS of 0, the LHS becomes: 


rik(dk) 
n ■ Ak 


\d k \ 

^T-dEi-D 


2-1 


2 = 1 


Ik fa)] • 

rceSfc(i) 


(13) 






2) If we expand the product term located on the RHS of we have: 

\ d k\ TT «, 

j£d k k i=l 


ij 

A\ 

x&S k (i) k 


(14) 


where the alternative adding and subtracting operations intend to remove the impact of redundant observation in n k {dk)- 
3) Deducting 1 from both l lT3l > and ( 1 1 4b and then multiplying the results by A k , i(9| turns to 


\ d k\ 


\ d k I 


E(-D‘ E ^=E(-d‘ E n,s - 7i 


Al-1 

v&s k (i) k 


(15) 

n *—' “—' ~ 

*=1 x £ Sfc ( i ) <=1 

It is clear there is a correspondence between the terms across the equal sign, where the terms on the LHS are the observed 
values and the terms on the RHS are the predictors. If we rewrite (IT5l) as 


Mkl 

Et-D’ E ( 


d-kis^) rijEa; Ti ^ _ q 




(16) 


i=l xGS k {i) 

the correspondence between correlations and observed values becomes obvious in ( 1 1 6[ i. 

To distinguish the MLE from the others proposed in this paper, we call it original MLE in the rest of the paper. 


V. Explicit Estimators based on Composite Likelihood 
(fl~6b shows that the original MLE considers all of the correlations available in E k that makes the estimator a high degree 
polynomial if the number of subtrees rooted at node k is larger than 6. To find explicit estimators in this circumstance, we 
must reduce the number of correlations considered by an estimator and use composite likelihood to create likelihood functions, 
composite likelihood is also called pseudo-likelihood by Besag [16]. Three strategies are used here to reduce the number 
of correlations used in estimation: block-wised, reduce scaled, and individual based. The block-wised strategy divides all 
correlations into blocks, each consists of the correlations of the same degree, from pairwise to c4-wise. The reduce scaled 
strategy, as named, reduces the number of subtrees considered in estimation. The individual based one considers each correlation 
separately. 


A. Block-wised Estimator (BWE) 

Let xjjk{x) = IL e * where x C dk, be the pass rate of the subtrees in x. If the number of probes reaching node k is 


denoted by hk{dk), the empirical value of x/jk{x) is equal to 

h{x) 

n k {dk)' 

From (fl6] >. \d k \ — 1 block-wised likelihood functions can be identified, from pairwise likelihood to |-\vise likelihood. Each 
of them corresponds to an item in the first summation of (fl6] >. In order to have a unique likelihood function for all of them, let 
the single-wise likelihood function be 1 and let the /‘-wise likelihood function be L c (t: A k \ y ). Then, the block-wised likelihood 
functions can be expressed uniformly. 

Lemma 1: There are a number of composite likelihood functions, one for a type of correlations, varying from pairwise to 
|c4|-wise. The composite likelihood function L c (i; A k ;y),i G {2, |(4|} has a form as follows: 

n * e s k (iM k Mx)) nk{x) (l - A k Mx)Y~ nk{x) 


Lc{i\ A k \ y) — 


II yes k (i-i)( A My)) nkiv) (l - A k x/j k (y)) n - n ^ ' 
i G {2, 


(17) 
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Proof: The nominator on the RHS of ( IT7I ) is the likelihood function considering the correlations from pairwise to i- wise 
inclusively and the denominator is the likelihood functions from single-wise to (i — l)-wise. The quotient of them is the 
likelihood dedicated to the /'-wise correlation. ■ 

Let A k (i) be the estimator derived from L c (i\ A k ', y) . Then, we have the following theorem. 

Theorem 3: Each of the composite likelihood equations obtained from dm is an explicit estimator of Ak that is as follows: 

A k {i) = ( ExeSk{i)U ;^ J )^ 1 A € {2 ,|4|}. (18) 

S k (i) ~ 

Proof: Firstly, we rewrite (fTTb into a log-likelihood function. We then differentiate the log-likelihood function wrt A k and 
let the derivative be 0. The likelihood equation as (fl8l > follows. ■ 

In the rest of the paper, A k (i) is used for the i — wise estimator and A k (i) for the estimate obtained by A k (i). 


B. Reduce Scaled Estimator (RSE) 

Instead of grouping the correlations of the same degree into a likelihood equation, the correlations can be grouped according 
to the subtrees rooted at node k. Since x, x C d k are selected, the estimators are called RSE. The log-likelihood function of 
the correlations within x is as follows: 


L(A k \Q x ) = n k {x)log(A k /3 k {x)) + (n - n k (x)) log(l - A k /3 k (x)) 


(19) 


where n k (x) is the number of probes reaching node k from the observations of the receivers attached to T(j),j £ x that 
equals to 

Ul 

rtk{x) = J2{-iy~ 1 Y T k(y) 

i=l y£S k {x) 

where S k (x) is the a- algebra created over x. P k {x) is the pass rate of T(j),j £ x, and defined as 


1 ~ Pk{x) = JJ(1 



Then, a similar likelihood equation as © is obtained and presented as follows: 


n k (x) 

n ■ A k 


na 

jex 



Clearly. (I2H is a reduce scaled of (|9j. If |x| < 5, the equation is solvable. The estimator is denoted by Am k {x). 


( 20 ) 


( 21 ) 


C. Individual based Estimator (IBE) 

The likelihood functions of the estimators in IBE have a similar structure as where f3 k {x) and n k (x) are replaced by 
'tpk(x) and I k (x ), respectively. Let E' fc = E k \ S k ( 1) be the correlations considered by IBE. Then, the log-likelihood function 
for A k given observation I k (x) is equal to 

L(A k \I k (x)) = Ik(x)log(A k f>k(x)) + (n- I k {x)) log(l - A k f> k {x)), x £ Y! k . (22) 

We then have the following theorem. 
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Theorem 4: Given d22K Ak'ipkix) is a Bernoulli process. The MLE for Ak given Ik(x) equals to 

'Uiew'yr 


Al 




(23) 


Proof: Using the same procedure as that used in IIV-AI we have the theorem. ■ 

Comparing ( ITSl ) with ( l23l >. we can find that Al k (x), where ||x| = i, is a type of geometric mean and A k (i) is the arithematic 
mean of Al k (x),x £ S k (i). Therefore, A k (i) is more robust than Al k (x). 

Using and combining the strategies presented above, we can have various explicit estimators. In fact, the estimator proposed 


in |8, 


11 ] is one of them that divides d k into two groups and only considers the pairwise correlations between the members of 


the two groups. Therefore, although the estimator proposed in 18, 11] is a MLE in terms of the observation used in estimation, 
it is not the same as ®. 

VI. Properties of the Estimators 

To evaluate the performance of the estimators proposed in this paper, we need to study the statistical properties of them, 
i.e., unbiasedness, consistency, uniqueness, variance, and efficiency. This section is devoted to the properties that consist of a 
number of lemmas, theorems and corollaries. 


A. Unbiasedness and Consistency 

The original MLE has been proved to be unbiased and consistent in [2]. Using the same methodology, we are able to 
prove the unbiasedness and consistency of the estimators in RSE. Thus, our attention here is focused on the properties of the 
estimators in IBE and BWE. 

For the unbiasedness of Al k {x),x £ T,' k , we have the following theorem. 

Theorem 5: Al k {x) is a unbiased estimator. 

Proof: Let h k {d k ) be the number of probes reaching Vk and let Zj, j £ <4 be the pass rate of T(j). In addition, let 
~z] = 2° k (dk) the sample mean of z 3 and Ak = nk< ^ k ) be the sample mean of Ak . Note that Zj and zi,j,l £ (4 are 
independent from each other if j f l. Apart from those, we use xf Ylj^x z .i to re place f\ jex if in the following derivation 
since the latter is equal to if which is equal to :if ]~[ ( c T Zj. We then have 

E{Al k Or) 11 * 1 " 1 ) = E 


= E 


= E 



fik(dk) 

,n 


nk(dk) 


= E 


E ^ n k ( d k) y\ x \-i^ E ^ jex n k (dk) 

(^ IN - 1 ) 


E nky* 

i =1 


E nk(dk) 
i=l 


n k (d k ) 


n 


(24) 


jex z j 


(25) 
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The theorem follows. 

Given theorem [5] we have the follow corollary. 
Corollary 1: Ak(i) is a unbiased estimator. 
Proof: According to theorem [5] we have 


E(A k {i)) 



SiEGS(i) 

SxgS(i) 


rijex Z 3 \ 
rijex z i 




(26) 


Note that we here prove that Al k (x ), x £ E' fc are unbiased estimators, rather than asymptotic unbiased ones as that obtained 
in [1] for Al k (d k ). 

Further, we can prove Al k (x) 1 ||x| = i and A k (i) are consistent estimates in the following lemma and theorem. 

Lemma 2: Al k {x) is a consistent estimate of A k . 

Proof: We have the following two points to prove the lemma. 

1) Theorem[5] shows that Al k {x) is equivalent to the mean of A k . Then, according to the law of large number, Al k {x) —» A k . 


2 ) From the above and the continuity of Al k (x ) on the values of 7 j,j £ x and I k (x)/n generated as A k ranges over its 
support set, the result follows. 


Then, we have 

Theorem 6: A k (i) is a consistent estimate of A k . 

Proof: As stated, A k (i) is a mean of Al k (x),x £ S k (i) that satisfies the followings inequality 


min Al k (x l*- 1 < A k (i) < max Alkix) 1 - 1 . 

xes k (i) xes k {i) 

Since all of Al k (x),x £ S k (i ) are consistent estimators, A k (i) is a consistent estimator. 
For the uniqueness of A k (i), we have. 

Theorem 7: If 

E IT.< E 

x€ s k {i)jex xes k (i) 

there is only one solution in (0,1) for A k (i ), 2 < i < \d k \- 

Proof: Since the support of A k is in (0,1), we can reach this conclusion from (fl8l >. 


(27) 


B. Efficiency and Variance of Al k {x), Arn k (x), and the original MLE 

Given (l22l >. we have the following theorem for the Fisher information of an observation, y, in IBE that can be used to 
determine the efficiency of Al k (x),x £ T,' k . 

'IV h ( X ) 

Theorem 8: The Fisher information of y on Al k (x),x C d k is equal to ———— - - -. 

A k { 1 — A k ip k (x)) 
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Proof: Considering I k {x) = y is the observation of the receivers attached to x, we have the following as the likelihood 
function of the observation: 

L(A k \y) = y\og(A k ip k (x)) + (1 -y)log(l - A k ip k (x)). 


Differentiating 


We then have 


wrt A kl we have 


dL(A k \y) _ y_ _ (1 - y)fk{x) 
A k 1 A k lj) k {yc) 


dA k 

d 2 L(A k \y) 


dAl 


y_ _ (l - y)f>k(x) 2 
A 2 k (1 - A k f> k (x)) 2 


(28) 


(29) 


(30) 


If I(Al k (x)\y) is used to denote the Fisher information of observation y for A k in Al k (x), we have 

l(Al k {x)\y) 


lp/ d 2 L(A k \y) 
~E( -HT72-) 


dA 2 

E(y) E(1 - y)i/j k (x) 2 

A 2 (1 - A k ip k (x)) 2 

ipk(x) 


(31) 


A k ( 1 - A k ij) k (x)) 

that is the information provided by y for A k . ■ 

Given (OH . we are able to have a formula for the Fisher information of the original MLE and the estimators in RSE. In order 
to achieve this, let (3 k (d k ) = 0 k . Then, we have the following corollary. 

Corollary 2: The Fisher information of observation y for A k in the original MLE and Am k [x),x C d k is equal to 


/3fc(a 


x C d k . 


(32) 


A k { 1 - A k f3 k (x )) ! 

Proof: Replacing n k (d k ) and n k (x) by y and replacing n — n k {d k ) and n — n k (x) by 1 — y from (J 6 ]» and (fl9] >. respectively, 
and then using the same procedure as that used in theorem [ 8 ] on the log-likelihood functions, the corollary follows. ■ 

Because of the similarity between (fTil and (l32l >. the two equations have the same features in terms of support, singularity, and 
maximums. After eliminating the singular points of them, the support of A k is in (0, 1 ) and the support of /3 k (x) (or f k {x)) is 
in [0, 1 ]. Both (OTb and ( 1321 ) are convex functions in the support and reaches the maximum at the points of A k —t 1 , B k (x) = 1 
(or ( ip k (x ) = 1) and A k —>• 0 ,/3 k (x) = 1 (or (ip k (x) = 1). Given A k , (l32l) is a monotonic increase function of j3 k {x) whereas 
(fTil is a a monotonic increase function of f> k (x). Despite the similarity, the efficiency of Al k {x) and Am k {x) go to opposite 
direction if x is replaced by y, x C y. Al k (y) is less efficient than Al k (x) since ip k (x) > ij) k (y), but Am k (y) is more efficient 
than Am k {x) since B k (x) < Bk{y)- Given theorem[ 8 j we are able to compare the efficiency of Al k (x),x £ S' fc and have the 
following corollary. 

Corollary 3: The efficiency of Al k (x),x £ E' fc forms a partial order that is the same as that formed on the inclusion of 
the members in Yf k , where the most efficient estimator must be one of the Al k (x),x £ S k {2 ) and the least efficient one must 
be Al k (d k ). 

Proof: The inclusion in T/ k forms a partial order, where a member of S k (i) is included by at least one in S k (i + 1 ), i + 1 < 
\d k \. Because all of the members in E' fc except those in S k { 2 ) include at least one of S k ( 2 ), the most efficient estimator must 
be one of Al k (x),x £ S k ( 2 ). On the other hand, Vx{x £ E*. —> x C d k }, Al k {d k ) is the least efficient estimator in 
Al k (x), x £ E' fc . ■ 
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Equation (flOl ). Am k {x ), and Al k (x) are of MLEs that have different focuses on the observations obtained. Because of this, 
they share a number of features, including likelihood functions and efficient equations. In addition, the variances of them can 
be expressed by a general function. Let mle denote all of them. Then, we have a theorem for the variances of the estimators 
in mle. 

Theorem 9: The variance of the estimators in mle equal to 


var{mle ) = ^ Ak ^ k ^\ x c d k 

Ok{X) 


(33) 


where S k (x) is the pass rate of the subtrees in x that is calculated on the basis of the definition of individual estimators. 

Proof: The passing process described by (l22l > is a Bernoulli process that falls into the exponential family and satisfies the 
regularity conditions presented in y . Thus, the variance of an estimator in mle reaches the Cramer-Rao bound that is the 
reciprocal of the Fisher information. ■ 

(l33l > unveils such a fact that the estimates obtained by an estimator spread out more widely than that obtained by direct 
measurement. The wideness is determined by 5 k {x), the pass rate of the subtrees connecting node k to observers. If 6 k (x) = 1, 
there is no further spread-out than that obtained by direct measurement. Otherwise, the variance estimated increases as the 
decreases of 5 k and in a super linear fashion. 


C. Efficiency and Variance of BWE 

The estimate obtained by A k (i) is a type of the arithmetic mean of Al k (x),x € S k (i) that has the same advantages and 
disadvantages as the arithmetic mean. A k (i) differs to Al k (x) by using a statistic that is not sufficient since some probes are 
considered more than once. Because of this, the Fisher information cannot be used to evaluate the efficiency of an estimator 
in BWE. Nevertheless, as a special arithmetic mean of Al k (x ), |x| = i, A k (i) shares many features as Al k (x). Thus, A(i) is 
more efficient than A{i + 1) and the variance of A(i ) is smaller than that of A(i + 1). 

VII. Model Selection and Simulation 

The large number of estimators in IBE, RSE and BWE, plus the original MLE, make model selection possible. However, 
to find the most suitable one in terms of efficiency and computational complexity is a hard task since the two goals conflict 
each other. Although one is able to identify the the most suitable estimator by computing the Kullback-Leigh divergence or 
the composite Kullback-Leigh divergence of the estimators, the cost of computing the Akaike information criterion (AIC) for 
each of the estimators makes this approach prohibitive. Nevertheless, the derivation of (l32l > successfully solves the problem 
since (f32l) shows the most suitable estimator must have the subtrees that have the highest end-to-end pass rates. 


A. Simulation 

To compare the performance of the estimators between the original MLE, A k (i), and Al k (x), three rounds of simulations 
are conducted in various setting. Five estimators: the original MLE, A k (2), Afc(3), Al k (x), |ar| = 2, and Al k (x), |ar| = 3, are 
compared against each other in the simulation and the results are presented in three tables, from Table M to Table UV] The 
number of samples used in the simulations varies from 300 to 9900 in a step of 300. For each sample size, 20 experiments 
with different initial seeds are carried out and the means and variances of the estimates obtained by the five estimators are 
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Estimators 

Full Likelihood 

A k ( 2) 

A k ( 3) 

Al k (x), |x| = 2 

Al k {x), \x\ = 3 

samples 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

300 

0.0088 

1.59E-05 

0.0088 

1.59E-05 

0.0088 

1.64E-05 

0.0087 

1.59E-05 

0.0087 

1.61E-05 

600 

0.0089 

1.12E-05 

0.0089 

1.12E-05 

0.0089 

1.13E-05 

0.0089 

1.10E-05 

0.0088 

1.12E-05 

900 

0.0092 

7.76E-06 

0.0092 

7.82E-06 

0.0091 

7.84E-06 

0.0092 

7.90E-06 

0.0092 

8.15E-06 

1200 

0.0095 

6.13E-06 

0.0095 

6.13E-06 

0.0094 

6.17E-06 

0.0095 

6.16E-06 

0.0095 

5.97E-06 

1500 

0.0096 

4.55E-06 

0.0096 

4.55E-06 

0.0096 

4.80E-06 

0.0096 

4.78E-06 

0.0096 

4.33E-06 

1800 

0.0096 

1.82E-06 

0.0096 

1.81E-06 

0.0096 

1.92E-06 

0.0097 

1.92E-06 

0.0096 

1.90E-06 

2100 

0.0097 

3.14E-06 

0.0097 

3.11E-06 

0.0097 

3.14E-06 

0.0097 

3.02E-06 

0.0097 

3.08E-06 

2400 

0.0100 

1.32E-06 

0.0100 

1.32E-06 

0.0100 

1.36E-06 

0.0100 

1.29E-06 

0.0099 

1.28E-06 

2700 

0.0100 

1.72E-06 

0.0100 

1.72E-06 

0.0100 

1.74E-06 

0.0100 

1.81E-06 

0.0100 

1.83E-06 

3000 

0.0102 

2.96E-06 

0.0102 

2.97E-06 

0.0102 

3.01E-06 

0.0102 

3.04E-06 

0.0102 

2.95E-06 

4800 

0.0103 

1.74E-06 

0.0103 

1.74E-06 

0.0103 

1.74E-06 

0.0103 

1.75E-06 

0.0103 

1.81E-06 

9900 

0.0099 

8.18E-07 

0.0099 

8.23E-07 

0.0099 

8.20E-07 

0.0099 

8.05E-07 

0.0099 

8.60E-07 


TABLE II 

Simulation Result of a 8-Descendant Tree with Loss Rate=1% 


Estimators 

Full Likelihood 

A k ( 2) 

A k ( 3) 

Al k (x ), \x\ = 2 

Al k (x), \x\ = 3 

samples 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

300 

0.0088 

1.59E-05 

0.0089 

1.64E-05 

0.0089 

1.68E-05 

0.0091 

2.36E-05 

0.0088 

1.95E-05 

600 

0.0089 

1.12E-05 

0.0089 

1.14E-05 

0.0089 

1.16E-05 

0.0088 

1.46E-05 

0.0089 

1.26E-05 

900 

0.0091 

7.76E-06 

0.0091 

7.80E-06 

0.0091 

7.83E-06 

0.0092 

9.74E-06 

0.0091 

8.67E-06 

1200 

0.0094 

6.13E-06 

0.0094 

6.16E-06 

0.0094 

6.18E-06 

0.0096 

7.09E-06 

0.0095 

6.16E-06 

1500 

0.0096 

4.55E-06 

0.0096 

4.72E-06 

0.0096 

4.81E-06 

0.0097 

4.36E-06 

0.0096 

4.45E-06 

1800 

0.0096 

1.82E-06 

0.0096 

1.90E-06 

0.0096 

1.95E-06 

0.0096 

2.45E-06 

0.0096 

1.97E-06 

2100 

0.0097 

3.14E-06 

0.0097 

3.11E-06 

0.0097 

3.11E-06 

0.0098 

3.39E-06 

0.0097 

3.04E-06 

2400 

0.0099 

1.32E-06 

0.0100 

1.34E-06 

0.0100 

1.35E-06 

0.0101 

1.64E-06 

0.0100 

1.44E-06 

2700 

0.0100 

1.72E-06 

0.0100 

1.69E-06 

0.0100 

1.67E-06 

0.0101 

2.11E-06 

0.0100 

1.90E-06 

3000 

0.0102 

2.96E-06 

0.0102 

2.93E-06 

0.0102 

2.91E-06 

0.0103 

2.83E-06 

0.0102 

2.87E-06 

4800 

0.0103 

1.74E-06 

0.0104 

1.74E-06 

0.0104 

1.74E-06 

0.0104 

2.06E-06 

0.0104 

2.01E-06 

9900 

0.0099 

8.18E-07 

0.0099 

8.30E-07 

0.0099 

8.36E-07 

0.0099 

9.78E-07 

0.0099 

9.11E-07 


TABLE III 

Simulation Result of a 8-Descendant Tree, 6 of the 8 have Loss Rate=1% and the other 2 have Loss Rate=5% 


presented in the tables for comparison. Due to the space limitation, we only present a part of the results in the tables, where 
all of the means and variance for the samples varying from 300 to 3000 are included. For the samples from 3300 to 9900, 
only two of them, i.e. 4800 and 9900, are presented. 

Table [II] is the results obtained from a tree with 8 subtrees connected to node k, where the loss rate of the subtrees are set 
to 1%. The result shows when the sample is small, the estimates obtained by all estimators are drifted away from the true 
value that indicates the data obtained is not stable. Once the sample size reaches 2100, the estimates approach to the true 
value because the data is stabilised around the true value. All of the estimators achieve the same outcome with the increase 
of samples. Generally, with the increase of samples, the variance reduces slowly although there are a number of exceptions. 
This indicates that there is no significant advantage of the original MLE and BWE over IBE if the subtrees connected to path 
of interest have the same loss rates. Therefore, by examining the pass rates of the paths connecting the source to the receivers 
of the subtrees, one is able to find the most suitable estimator. 
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Estimators 

Full Likelihood 

A k (2) 

A k ( 3) 

Al k (x), |x| = 2 

Al k (x), |x| = 3 

samples 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

Mean 

Var 

300 

0.0503 

2.15E-04 

0.0504 

2.15E-04 

0.0505 

2.14E-04 

0.0508 

2.18E-04 

0.0505 

2.16E-04 

600 

0.0503 

8.23E-05 

0.0503 

8.21E-05 

0.0503 

8.19E-05 

0.0504 

8.24E-05 

0.0503 

8.27E-05 

900 

0.0511 

5.85E-05 

0.0511 

5.81E-05 

0.0511 

5.79E-05 

0.0512 

5.79E-05 

0.0512 

5.88E-05 

1200 

0.0506 

4.93E-05 

0.0506 

4.97E-05 

0.0507 

4.99E-05 

0.0507 

4.85E-05 

0.0507 

4.93E-05 

1500 

0.0502 

2.24E-05 

0.0502 

2.24E-05 

0.0502 

2.23E-05 

0.0503 

2.33E-05 

0.0502 

2.32E-05 

1800 

0.0500 

3.89E-05 

0.0500 

3.85E-05 

0.0500 

3.83E-05 

0.0501 

3.91E-05 

0.0500 

3.94E-05 

2100 

0.0507 

1.16E-05 

0.0507 

1.19E-05 

0.0507 

1.20E-05 

0.0507 

1.09E-05 

0.0507 

1.13E-05 

2400 

0.0510 

1.40E-05 

0.0510 

1.43E-05 

0.0510 

1.44E-05 

0.0510 

1.40E-05 

0.0510 

1.43E-05 

2700 

0.0507 

1.31E-05 

0.0507 

1.34E-05 

0.0507 

1.35E-05 

0.0508 

1.35E-05 

0.0507 

1.34E-05 

3000 

0.0508 

6.65E-06 

0.0508 

6.98E-06 

0.0508 

7.14E-06 

0.0508 

6.79E-06 

0.0508 

6.85E-06 

4800 

0.0498 

1.09E-05 

0.0498 

1.10E-05 

0.0498 

1.10E-05 

0.0498 

1.11E-05 

0.0498 

1.11E-05 

9900 

0.0496 

5.35E-06 

0.0496 

5.38E-06 

0.0497 

5.40E-06 

0.0496 

5.48E-06 

0.0496 

5.48E-06 


TABLE IV 

Simulation Result of a 8-Descendant Tree, the loss rate of the root link=5%, 4 of the 8 have Loss Rate=1% and the other 4 have 

Loss Rate=5% 


To see the impact of different loss rates at the subtrees on estimates, another round simulation is carried out on the same 
network topology. The difference between this round and the previous one is the loss rates of the subtrees connected to the 
path of interest, where 6 of the 8 subtrees have their loss rates equal to 1% and the other two have their loss rates equal to 
5%. The two subtrees selected by the paired local estimator have their loss rates equal to 1% and 5%, respectively. Two of 
the three subtrees used by Alk(x), |tc| = 3 have their loss rates equal to 1% and the other has its loss rate equal to 5%. The 
results are presented in Table [HI] Compared Table [III] with Table [III there is no change for the original MLE and there are 
slight changes for the estimators of the pairwise likelihood (A k (2)) and the triplet-wise likelihood (.4/ ; .(3)). In contrast, the 
variances and the means of the other two have noticeable differences from their counterparts, in particular if the sample size 
is smaller than 1000. In addition, the two have a slightly higher variances than that obtained in the first round in general. This 
indicates that the sensitivity of the estimators in IBE in terms of selecting observation or observers for estimation. If applying 
the result derived from (f33l) . we should examine the pass rates of the paths connecting the source to the subtrees first. Then, 
the subtrees that have loss rates equating to 1% would be selected for AIk(x), |cc| = 2 or 3. If so, the same result as the most 
right four columns of table [IT] will be obtained that is certainly better than that in table [Till 

To further investigate the impact of loss rates on estimation, we conduct the third round simulation, where the loss rate 
of the path of interest is increased from 1% to 5%, and the loss rates of four subtrees are set to 5% and the other four to 
1%. The two estimators from IBE, i.e. Alk{x ), |.X'[ = 2 and 3, consider the observations obtained from the subtrees that have 
1% loss rate. The result is presented in Table IIV1 it differs from the previous two tables in the estimated variances that are a 
magnitude higher than that of the previous two regardless of the estimators. This is actually an expected result of d33l l. i.e., a 
smaller 5k(x) results in a bigger variance. 


VIII. Conclusion 

This paper starts from finding inspirations that can lead to efficient explicit estimators for loss tomography and ends with a 
large number of unbiased and consistent explicit estimators, plus a number of theorems and corollaries to assure the statistical 


























16 


properties of the estimators. One of the most important findings is of the formulae to compute the variances of A/, estimated 
by the estimators in RSE, IBE and the original MLE. Apart from clearly expressing the connection between the path to be 
estimated and the subtrees connecting the path to the observers of interest, the formulae potentially have many applications 
in network tomography, some have been identified in this paper. For instance, using the formulae, we have ranked the MLEs 
proposed so far, including those proposed in this paper. In addition, the formulae make model selection possible in loss 
tomography and then the multicast used in end-to-end measurement is no longer only for creating various correlations but also 
for identifying the subtrees that can be used in estimation. The effectiveness of the strategy has been verified in a simulation 
study. Despite this, the potentials of the formulae have not reached that require further exploration. 
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